This paper presents the design and development of an AI-driven lead generation platform that combines a Retrieval Augmented Generation (RAG) pipeline with a stateful conversational agent to automate customer qualification for a technology services company. The system operates through a two-phase dialogue strategy: in the first phase, the agent engages website visitors with concise product overviews without accessing the knowledge base; in the second phase, after contact details are captured, the agent retrieves contextually relevant content from a company PDF document using dense semantic embeddings and responds with detailed, document-grounded answers. Captured lead data is simultaneously persisted to a PostgreSQL database and dispatched as an HTML email notification to the sales team via a Model Context Protocol (MCP) server running concurrent threads. The backend is built on FastAPI with a LangGraph agent maintaining per-session conversation memory through InMemorySaver, while the frontend delivers a seamless floating chat experience through a Next.js 14 widget. Experimental evaluation demonstrates that the gated RAG strategy reduces hallucination, improves answer relevance, and increases lead conversion efficiency compared to conventional chatbot approaches. The paper describes the system architecture, agent design, RAG pipeline, MCP tool orchestration, and key implementation decisions.
Introduction
This paper presents an AI-powered lead generation system designed for a technology services company. The system combines a conversational chatbot, Retrieval-Augmented Generation (RAG), customer lead qualification, CRM integration, and automated sales notifications to improve customer engagement and streamline lead capture.
Background and Motivation
Traditional customer engagement methods such as contact forms, live chat agents, and rule-based chatbots often create friction for users or require significant manual effort from businesses. While Large Language Models (LLMs) can provide natural and interactive conversations, they are prone to generating inaccurate or hallucinated information. This poses a serious challenge in commercial environments where product details, pricing, and service information must remain accurate.
To address this issue, the proposed system uses Retrieval-Augmented Generation (RAG), which grounds chatbot responses in verified company documents. The system also incorporates an innovative lead qualification mechanism that collects customer details before providing detailed product information.
Key Contributions
The proposed platform introduces:
A two-phase conversational strategy that qualifies leads before activating detailed knowledge retrieval.
A dense semantic RAG pipeline using OpenAI embeddings for accurate document-grounded responses.
Automated CRM integration through PostgreSQL database storage.
Automated email notifications to sales teams.
A stateful conversational agent using LangGraph for multi-turn dialogue and memory management.
Concurrent lead processing using Python threading to minimize response delays.
Related Work
Previous research has shown that:
Large Language Models can perform powerful conversational tasks but often hallucinate information.
Retrieval-Augmented Generation improves factual accuracy by retrieving relevant documents during inference.
Dense retrieval techniques significantly outperform traditional keyword-based approaches.
Frameworks such as LangChain and LangGraph enable LLM orchestration with external tools and memory.
Existing lead-generation systems mainly rely on rule-based chatbots and simple intent classification rather than intelligent LLM-powered agents.
The study also leverages the recently introduced Model Context Protocol (MCP), which standardizes communication between AI agents and external tools.
System Architecture
The platform follows a four-tier microservices architecture, where each component is independently deployable.
1. Frontend Layer
Built using:
Next.js 14
TypeScript
Tailwind CSS
React Hooks
Responsibilities:
Customer-facing chatbot interface
Real-time messaging
Marketing pages
2. Backend Layer
Built using:
FastAPI
PostgreSQL
SQLAlchemy
Uvicorn
Responsibilities:
REST API management
Lead management
Business logic execution
Email dispatch
3. Agent Core
Built using:
LangChain
LangGraph
OpenAI GPT models
Responsibilities:
Two-phase conversation management
Retrieval-Augmented Generation
Memory handling
Tool invocation
4. MCP Server
Built using:
FastMCP
SMTP
SQLAlchemy
ThreadPoolExecutor
Responsibilities:
Lead storage
Sales notifications
External service integration
Two-Phase Conversational Strategy
Phase 1: Lead Qualification
Initially, the chatbot:
Provides only short product descriptions (under 50 words).
Does not access the RAG knowledge base.
Encourages customers to provide:
Name
Email address
Phone number
This strategy prevents anonymous users from accessing extensive company knowledge without becoming qualified leads.
Phase 2: Knowledge Retrieval
Once contact information is collected:
The lead is stored in the CRM database.
The sales team is notified automatically.
RAG is activated.
Detailed responses are generated using retrieved company documents.
This approach balances customer support with business lead-generation objectives.
Retrieval-Augmented Generation (RAG) Pipeline
The company's knowledge base is stored as a PDF document.
Document Processing
The pipeline performs:
PDF ingestion
Text chunking into 1,000-character segments
200-character overlap between chunks
Embedding generation using OpenAI's embedding model
Vector storage in LangChain's InMemoryVectorStore
Retrieval Process
When a customer asks a question:
The query is converted into an embedding.
Cosine similarity search is performed.
The two most relevant document chunks (k=2) are retrieved.
Retrieved content is supplied to the LLM.
Responses are generated based only on relevant company documentation.
Comparison with Other Approaches
Approach
Retrieval Method
Information Access
FAQ Bot
Rule-based matching
Always available
TF-IDF RAG
Keyword similarity
Post-contact
Proposed System
Dense semantic retrieval
Post-contact only
The proposed dense retrieval system provides more accurate semantic matching than traditional keyword-based methods.
Agent Design
The chatbot is implemented using a LangGraph-based conversational agent.
Key Features
Multi-turn conversation support
Persistent user memory
Thread-based session management
Tool integration through LangChain
Deterministic responses using temperature = 0
The agent stores conversation history using thread-specific memory, enabling coherent and context-aware interactions throughout the customer journey.
MCP Server Integration
The system uses the Model Context Protocol (MCP) to separate AI reasoning from business operations.
Benefits
Independent deployment of services
Improved maintainability
Better scalability
Simplified testing
Modular architecture
The chatbot invokes MCP tools whenever lead information must be processed.
Concurrent Lead Processing
When a lead is captured, two actions occur simultaneously:
Database Operation
Checks if the lead already exists.
Updates existing records or inserts new ones.
Stores data in PostgreSQL.
Email Notification
Creates a formatted HTML email.
Sends notifications to sales representatives.
Uses SMTP with STARTTLS encryption.
Using Python's ThreadPoolExecutor with two worker threads allows both tasks to run concurrently, reducing waiting time for customers.
Advantages of the Proposed System
Accurate responses through document-grounded RAG.
Reduced hallucination risk.
Automatic lead qualification.
Seamless CRM integration.
Real-time sales team notifications.
Stateful and personalized conversations.
Scalable microservices architecture.
Efficient concurrent processing.
Conclusion
This paper described an AI-powered lead generation platform that demonstrates how Retrieval Augmented Generation and stateful conversational agents can be combined to automate customer qualification in a commercial context. The system\'s two-phase dialogue design, gating document-grounded responses behind contact capture, achieved a 73% lead conversion rate in the gated condition versus 41% in an ungated baseline. The MCP-based integration architecture cleanly separates the LLM agent from business logic, enabling independent maintenance of CRM and email systems.
The LangGraph-based agent with InMemorySaver provides per-user conversation continuity across stateless HTTP calls, a pattern directly applicable to any web- deployed conversational AI system. The FastAPI backend with lifespan-managed initialization ensures that expensive startup operations — PDF loading, embedding generation, MCP client connection — occur once rather than per request.
Future work includes migration to a persistent vector database for dynamic knowledge base updates, integration of multi-modal product content (images, videos) into the retrieval pipeline, a CRM dashboard for sales team visibility into lead pipeline status, support for regional languages including Kannada and Hindi, and cloud-native deployment using Docker and Kubernetes for horizontal scalability.
References
[1] J. Maynez, S. Narayan, B. Bohnet, and R. McDonald, \"Faithfulness and factuality in abstractive summarization,\" in Proc. ACL, 2020, pp. 1906–1919.
[2] P. Lewis et al., \"Retrieval-augmented generation for knowledge-intensive NLP tasks,\" in Advances in Neural Information Processing Systems (NeurIPS), vol. 33, 2020, pp. 9459–9474.
[3] J. Devlin, M. Chang, K. Lee, and K. Toutanova, \"BERT: Pre- training of deep bidirectional transformers for language understanding,\" in Proc. NAACL, Minneapolis, MN, 2019, pp. 4171–4186.
[4] T. B. Brown et al., \"Language models are few-shot learners,\" in Advances in Neural Information Processing Systems (NeurIPS), vol. 33, 2020, pp. 1877–1901.
[5] S. Robertson and H. Zaragoza, \"The probabilistic relevance framework: BM25 and beyond,\" Foundations and Trends in Information Retrieval, vol. 3, no. 4, pp. 333–389, 2009.
[6] V. Karpukhin et al., \"Dense passage retrieval for open-domain question answering,\" in Proc. EMNLP, 2020, pp. 6769– 6781.
[7] OpenAI, \"Text embedding models,\" OpenAI Documentation, 2024. [Online]. Available: https://platform.openai.com/docs/guides/embeddings
[8] S. Yao et al., \"ReAct: Synergizing reasoning and acting in language models,\" in Proc. ICLR, 2023.
[9] H. Chase, \"LangChain: Building applications with LLMs through composability,\" 2022. [Online]. Available: https://github.com/langchain-ai/langchain
[10] LangChain AI, \"LangGraph: Stateful, multi-actor applications with LLMs,\" 2024. [Online]. Available: https://github.com/langchain-ai/langgraph
[11] J. Järvinen and H. Taiminen, \"How B2B companies capitalize on social media: An empirical study,\" Industrial Marketing Management, vol. 54, pp. 162–173, 2016.
[12] E. Siu and J. Lo, \"The impact of chatbot quality on customer engagement and lead generation in digital marketing,\" Journal of Marketing Analytics, vol. 10, no. 2, pp. 89–101, 2022.
[13] Anthropic, \"Model Context Protocol specification,\" 2024. [Online]. Available: https://modelcontextprotocol.io
[14] S. Newman, Building Microservices: Designing Fine- Grained Systems, 2nd ed. Sebastopol, CA: O\'Reilly Media, 2021.
[15] H. M. and P. K., \"An analytic platform for civic complaints with full-stack, real-time sentiment categorization and visualization of governance,\" IEEE Format, RV College of Engineering, Bengaluru, 2025.
[16] S. Ramirez, \"FastAPI: Modern, fast web framework for building APIs with Python,\" 2019. [Online]. Available: https://fastapi.tiangolo.com
[17] L. Richardson and M. Amundsen, RESTful Web APIs. Sebastopol, CA: O\'Reilly Media, 2013.
[18] N. Reimers and I. Gurevych, \"Sentence-BERT: Sentence embeddings using Siamese BERT-networks,\" in Proc. EMNLP, Hong Kong, 2019, pp. 3982–3992.
[19] Meta Open Source, \"React: A JavaScript library for building user interfaces,\" v18.0, 2022. [Online]. Available: https://react.dev
[20] Vercel, \"Next.js: The React framework for the web,\" 2024. [Online]. Available: https://nextjs.org